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A hidden Markov model is called observable if distinct initial 
laws give rise to distinct laws of the observation process. Observabil- 
t*^ , ity implies stability of the nonlinear filter when the signal process is 

tight, but this need not be the case when the signal process is unsta- 
ble. This paper introduces a stronger notion of uniform observability 
which guarantees stability of the nonlinear filter in the absence of 
stability assumptions on the signal. By developing certain uniform 
approximation properties of convolution operators, we subsequently 
r^ . demonstrate that the uniform observability condition is satisfied for 

various classes of filtering models with white-noise type observations. 
This includes the case of observable linear Gaussian filtering models, 
so that standard results on stability of the Kalman-Bucy filter are 
obtained as a special case. 



(N 

j^ . 1. Introduction. In a classic paper, Blackwell and Dubins [2] have ob- 

00 '. tained the following remarkably general result. Let {Yk)k>o be a discrete time 

OO ■ stochastic process which takes values in a Polish space, and consider the reg- 

ular conditional probabilities P((^fc)fc>-m G -l^^o, • • • ,i^m,) and Q((lfc)fc>m G 
'nT I -lYo, • • • ,Ym)- Then if P ~ Q, one can show that P- and Q-a.s. 

' m — ^oo 

||P((n)fc>m G -I^O, ...,Ym)- Q{{Yk)k>m G -I^O, • • ■ ,Ym)hy > 



without any further assumptions on the laws P and Q. The interpretation of 
Blackwell and Dubins is that P and Q represent the "opinions" of two indi- 
viduals about the dynamics of the time series (y/c)fc>o- When the individuals 
5^ I observe an initial portion of the time series (lfc)fe<m) they update their opin- 

ion of the future observations (lfc)fc>m by Bayesian learning. The result then 
guarantees that the opinions of the two individuals will eventually merge, 
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2 R. VAN HANDEL 

provided the individuals agree on which events can and cannot occur. A 
continuous time counterpart of this result was obtained by Tsukahara [27] 
using the prediction process of F. Knight. 

The result of Blackwell and Dubins typically does not hold when P and 
Q are mutually singular, even when the total variation distance || • ||tv 
is replaced by a weaker measure of proximity. Motivated by this problem, 
Diaconis and Freedman [11] investigated a special class of models with mu- 
tually singular measures for which the merging of opinions still holds in a 
weak sense. This has led to the investigation of various notions of merging 
of probability measures [10] which are compatible with the topology of weak 
convergence of probability measures. Indeed, the result of Blackwell and Du- 
bins shows that the regular conditional probabilities P((i^fc)fc>m £ •|(^fc)fc<m) 
and Q((ifc)fc>m G •|(^fc)fc<m) converge toward one another, despite that nei- 
ther sequence of probability measures is in fact itself convergent. Particularly 
when the state space is not compact, proving that two sequences of proba- 
bility measures merge can be subtle (see [13], Section 11.7). 

Such considerations play a central role in the present paper. Unlike the 
setting studied by Diaconis and Freedman, we will be content to assume the 
absolute continuity of our probability measures. In contrast to the problem 
studied by Blackwell and Dubins, however, we will consider a setting where 
we do not have access to the full information about the past history of the 
process under consideration, but we are only able to observe a subfiltration. 
Thus, in essence, we are interested in the merging of opinions with partial 
information. We will restrict ourselves to a particular aspect of this problem, 
the stability of the nonlinear filter, which has attracted much attention in 
recent years (see [9] and the references therein). As we will see, this problem 
can be investigated very much in the spirit of the work of Blackwell and 
Dubins in combination with two new ingredients: the merging of probability 
measures in the dual bounded-Lipschitz distance || • Hbl, as studied in the 
fundamental papers of Pachl [22] and Cooper and Schachermayer [7], and 
certain uniform approximation properties of convolution operators. 

We will work chiefly in continuous time (though a general discrete time 
result is developed for comparison in Section 3.4). Let {Xt,Yt)t>o be a 
Markov additive process in the sense of (^inlar [6]; this means that under 
the probability measure P'^, the processes {Xt)t>o and {Xt,Yt)t>o are time- 
homogeneous Markov processes with initial law {Xq,Yq) ~ /i® Jq) and that 
(^)t>o has conditionally independent increments given {Xt)t>o- This is the 
standard assumption on a hidden Markov model in continuous time, where 
Yt is the observed component and Xf is the unobserved component. Let us 
now define the regular conditional probabilities '/r^(-) =P^{Xt £ ■\(Ys)s<t), 
that is, vrf is the nonlinear filter associated to our model. We are interested 
in finding conditions such that tTj* and vrj" merge in an appropriate sense as 
t — > oo for different initial measures fi, u. 
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Remark 1.1. Using similar methods, one could also investigate the 
merging of the full predictive distributions P>^{{Xr)r>t £ •|(i^s)s<i)- In the 
present paper, however, we will restrict ourselves to the study of the non- 
linear filter. 

Our approach has its origin in the work of Chigansky and Liptser [5] , who 
discovered independently a corollary of the result of Blackwell and Dubins 
and applied it to prove that the filtered estimates of certain functions of the 
signal process are always stable. This idea was significantly generalized by 
the author in [29] , where a characterization of all such functions was obtained 
by a functional analytic argument in the case where the signal state space is 
compact. In particular, it turns out that the filters vrf and vrf actually merge 
in a weak sense whenever the following observability condition is satisfied: 

or, in other words, when distinct initial laws give rise to distinct laws of 
the observation process. It is tempting to conjecture that this observability 
criterion also leads to stability of the filter when the signal state space is not 
compact, as this is well known to be the case in the special case of linear 
Gaussian filtering models [19]. However, as the following example shows, 
this conjecture is not correct. 

Example 1.2. Consider a signal process Xt on the state space [l,oo[ 
defined as Xt = Xoe^* (A > 0, Xq > 1), and consider the observation process 

Yt= [ h{Xs) ds + Wu h{x) = x-\ 

JO 

Here Wt is a Wiener process independent of Xq. We claim that this model 
is observable, but that there exist ^ ~ i/ such that vr^* and vr^ do not merge 
as t ^ CO. 

Indeed, observability is easily demonstrated along the lines of [29], Section 
5.1. To prove that vrf and 7rf do not merge, set f{x) = cos(log(x)) and 
tn = 27rn/A, n G N. Note that f{Xi^) = f{Xo) for every n G N, so that 



n— »oo 



<(/) = E''(/(Xo)|(F.).<,J .E''(/(Xo)|(y.) 



r<oo) 



for any initial measure p. It thus suffices to show that 'E'^ {f {XQ)\{Yf.)r<oo) 7^ 
E'^(/(Xo)|(yr)r<oo) for some ^, i^. But by the Bayes formula 

E''(/(Xo)|(y,).<oo) 

_ / fix) exp(x-^ /o°° e-^' dY^ - l/2x-2 /q"" e-^^' ds)p{dx) 
/exp(x-i /o°° e-^^ dYs - l/2x-2 f^ e-^^' ds)p{dx) ' 

which is clearly not independent of p. 
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In the present paper we take a somewhat different point of view than 
in [29]. The basic idea behind our approach is easily explained. Using the 
Markov additive property of our model, it is not difficult to verify that 

P^((y, - Yt)r>t G ■\{Ys)s<t) = V<{{Yr)r>0 G •)• 

An argument along the lines of Blackwell and Dubins applies to the left-hand 
side of this expression. In particular, we find that 

U ,y t ^OO 

IIP"* ((i;)r>0 G •) - P"* {{Yr)r>G G OIItV ' 0' V^'-^.S., 

whenever ^^\a{(Yr)r>o} '^^''\('{{yr)r>a}- ^°^ suppose that we could prove 
that 

n— »oo n— >oo 

l|P''"U{(y.)r>o}-P''"U{(yr-),.>o}llTV '0 implies ||/i„ - i^nllBL >0 

for any two sequences of probability measures {^n}-,{i'n}- In this case, the 
filtering model is called uniformly observable, and it follows automatically 
that 

i— >oo 

IKr-<||BL ^0, P^'-a.s., whenever P''U{(yo.>o} ^ P''U{{i'r).>o}' 

that is, that the filters merge in the dual bounded-Lipschitz distance. This 
argument can be made rigorous with some care, which is done in Theorem 
3.3 below. 

That a filtering model may be observable but not uniformly observable is 
demonstrated by the counterexample above. It is easily established, however, 
that the two notions are identical when the signal state space is compact 
(Proposition 3.5), so that results of [29] follow as a special case. When the 
state space is not compact, proving that a filtering model is uniformly ob- 
servable is more difficult. We will prove that a large class of diffusion signals 
with white noise type observations is uniformly observable (Section 3.4). In 
addition, we will show that in the linear Gaussian setting, uniform observ- 
ability is equivalent to observability in the sense of linear systems theory 
(Section 3.3). This reproduces a well-known result on the stability of the 
Kalman-Bucy filter [19], which was hitherto out of reach of general stabil- 
ity results for nonlinear filters. The proofs of these facts rely on two key 
technical tools which are developed in the appendices. 

The stability of nonlinear filters has been an active research topic in recent 
years, see, for example, [9] and the references therein. The majority of results 
in this direction assume that the signal process is ergodic or at least tight. 
Such results therefore do not allow us to prove stability of the filter when the 
signal process is unstable, that is, when its mass does not remain localized in 
a compact set. Beside specialized results for the Kalman-Bucy filter, almost 
all existing results in the unstable case either explicitly [4, 8, 18, 21] or 
implicitly [26] rely on some form of "balancing of rates" argument, where 
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a rate of contraction must win from an opposing rate of expansion in order 
to give rise to stability of the filter.^ This invariably implies that stability 
of the filter is only proved when the signal to noise ratio of the observations 
is sufficiently high. In contrast, the results in the present paper guarantee 
filter stability for a large class of unstable signals in a manner that is purely 
structural and is completely independent of the signal to noise ratio. This 
suggests that though one may prove filter stability by a balancing of rates 
argument — the latter often even leads to quantitative results on the rate of 
stability — this does not reflect the fundamental mechanism that causes the 
filter to be stable, at least in the models considered here. (The author is not 
aware of an example where the filter loses stability as the signal to noise 
ratio crosses a positive threshold.) 

The remainder of this paper is organized as follows. In Section 2 we in- 
troduce the canonical hidden Markov model and the associated filtering 
problem. Section 3 is devoted to the statement of our main results and 
contains some short proofs. Longer proofs can be found in Sections 4 and 
5. The appendices develop the technical tools that are used in our proofs. 
Appendix A establishes that certain distances between probability kernels 
(including the dual bounded-Lipschitz distance) are in fact measurable. Ap- 
pendix B develops a general result on the merging of probability measures 
in the dual bounded-Lipschitz distance. This result was already obtained in 
a more general setting in [7, 22], but we give here a more elementary proof 
in the Euclidean setting. The latter is all that will be needed in our proofs, 
and also serves to keep the paper more self-contained. Finally, Appendix 
C develops a uniform approximation result for convolution operators which 
plays an important role in proving uniform observability for additive noise 
models. 

2. The hidden Markov model. The purpose of this section is to introduce 
the general class of models which will be studied throughout the paper. We 
also introduce the filtering problem and state some fundamental regularity 
properties. 

2.1. Preliminaries. Before we introduce our hidden Markov model, let 
us fix some notation that will be used throughout the paper. 

Let 5 be a Polish space endowed with a complete metric ds- We denote by 
B{S) the Borel a-field of S, and we define the spaces B(S) of bounded mea- 
surable functions, Cb{S) of bounded continuous functions, Ub{S) of bounded 



^An exception is the result of [3], where filter stability is proved under the strong 
assumption that the observation noise has compact support. In this setting the nonlinear 
filter is itself compactly supported, so that this reduces essentially to the case of a compact 
signal state space. 
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uniformly continuous functions and P{S) of Borel probability measures. We 
always endow B{S), Cb{S) and Ub{S) with the topology of uniform con- 
vergence, and P{S) with the topology of weak convergence of probability 
measures (recall that the space P{S) is then itself Polish [23], Theorems 
II.6.2 and II.6.5). We denote 

\\f\\^ = sup^lMf^, ||/||oo=sup|/(rE)| forall/Gi?(5), 

x^y ds{x,y) X 

and we define Lip(5) = {/ G Cb{S) : ||/||oo < 1 and ||/||l < 1}. 

Let G C Cb{S) be uniformly bounded sup^g,^ Hf^Hoo < oo, and define 



\IJ.-u\\g :=sup 
geG 



gdfi- j gdv 



fi,ueP{S). 



Then ||/u — z^||g is a pseudometric on P{S), and is a metric whenever G 
is a separating class [15], Section 3.4. We will frequently encounter the 
following special cases: the dual bounded-Lipschitz distance ||/U — z^||bl := 
Wl-'' ~ ^l|Lip(5)) which metrizes the Polish space P{S) [13], Theorem 11.3.3, 
and the total variation distance ||/i — z^||tv := IIa* — ^||g with G = {f £ 
a(5):||/||oo<l}. 

As we will be interested in distances between random probability mea- 
sures, it is important to establish that the distance ||/z — i^||g is a (mea- 
surable) random variable for any pair of probability kernels /U, i/. Corollary 
A. 2 in Appendix A establishes that this is the case whenever the family 
G C Gb{S) is uniformly bounded and equicontinuous; in particular, we find 
that ll/i — i^IIbl is measurable for any pair of probability kernels fijV. That 
the total variation distance ||// — i^||tv between kernels is measurable is well 
known; this follows from the existence of a measurable version of the Radon- 
Nikodym derivative (see, e.g., [20], Theorem 3.1). 

2.2. Hidden Markov model. Throughout this paper, we consider a con- 
tinuous time hidden Markov model with signal state space E and observation 
state space W^ (the observation dimension g G N is fixed at the outset). We 
presume only that E is Polish and we endow it with a distinguished complete 
metric d. 

Let n^ = Z?([0,oo[;^) and Q^ = D{[0,oo[;R'') be the spaces of ^-valued 
and M^-valued cadlag paths, respectively. We endow $7^ and OX with the 
Skorokhod topology so that they are Polish [15], Theorem 3.5.6. We will work 
on the probability space O = J7 x $7 with its Borel cr-field J^ = B{Q x 
il^) , and we denote hy Xt:Q ^ E and 1^ : fi — > M'^ the coordinate projections 
Xt{x, y) = x{t), Yt{x, y) = y{t). Furthermore, we define the natural filtrations 



J'f=a{Xs:s<t}, Tt' =a{Ys:s<t}, Tt = Tt^T, 
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and the filtration generated by the observation increments 

Gl=a{Ys-Yo:s<t}. 

We will denote fi = Vi>o-^t ' ^^'^ -^ ^"^^ ^ ^^^ defined similarly. The 
canonical shift 0^ : $7 ^ Jl is defined as 9t{x,y){s) = (x(s + t),y{s + t)). 

We now proceed to impose on this canonical setup the structure of a 
hidden Markov model, where Yj is the observation process and Xt is the 
signal process. Our basic assumption is that the pair {Xt,Yt)t>o is a time- 
homogeneous Markov process, whose semigroup we will denote as Tt:B(E x 
M*^) ^ B(E xM'^). We therefore presume that we are given a family {P^ : fj, £ 
P{E)} C P{0,) such that for every /x G P{E), the pair (Xt, Yt)t>o is a Markov 
process under P^ relative to the usual augmentation [24], Section 1.4, of .T^t 
with respect to the family {P'^i/i G P{E)}, with semigroup Tj and initial 
measure fi^S^^y To be precise, let us denote by J^, J^'^ , T^ , Q^ the comple- 
tions of T ^ !F-^ , T^ ^ Q^ and by Tt^ ^^ ^ ^Y i ^Y the usual augmentations 
of Pt, -^f , f^Y, QY with respect to the family {P^':/i G P{E)}. We then 
assume that 

Vi'{f{Xt,Yt)\Ts) = {Tt^sf){Xs,Ys) for all f e B{E x R''),fie P{E), 

whenever t > s > 0, and that 

P^{f{Xt,Yt)) = J iTtf){x,0)fiidx) for all f e B{E x R^),i2e P{E). 

Before we proceed, two remarks are in order. 

Remark 2.1. When E is locally compact and Tt is Feller, one can al- 
ways construct the family P'^ with the required properties directly from the 
semigroup Tj, for example, see [15]. As we have only assumed that E is Pol- 
ish, we impose the existence of the family P'^ as an assumption. However, 
the locally compact Feller case furnishes a broad family of examples where 
the construction can be accomplished. 

Remark 2.2. The restriction to initial laws of the form ^ ® 5tQ\ is in 
essence the requirement that the initial observation Tq does not contain any 
information on the signal. The general case can be reduced to this setting, 
however, so there is no loss of generality in our assumptions (see the remark 
in [29], Section 2). 

We now impose on our Markov model {Xt,Yt)t>o the fundamental as- 
sumption that it is a Markov additive process in the sense of Qinlar [6] , that 
is, we require that the semigroup Tt satisfies the following condition: 

For any/ G B[E x M'^), {TtSyf){x,y) does not depend on y. 
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Here Sy:B{ExR'i)^B{ExR'}) is defined as {Syf){x,z)=f{x,z-y). It 
is not difficult to verify (see also [6]) that this assumption corresponds to 
the following two properties: first, the process {Xt)t>o is a Markov process 
in its own right [i.e., Ttf G B{E) whenever / S B{E), where B{E) is seen 
as a natural subspace of B{E x M"^)]; second, under the conditional law of 
(^t)t>o given T-^ , the process {Yt)t>Q has independent increments. This first 
property enforces the idea that there is no feedback in the system, so that 
the evolution of the signal is not affected by the observations. The second 
property enforces the idea that the observation noise is memoryless. The 
process (Xt,lt)t>o is therefore a natural continuous time counterpart of the 
usual discrete time notion of a hidden Markov model, and the vast majority 
of continuous time filtering problems that are encountered in the literature 
fit in this framework (see, e.g., [30]). 

2.3. The filtering problem. Roughly speaking, the problem of nonlinear 
filtering is to compute the conditional distributions P^(Xt G •\J'Y)- As we 
will be dealing with convergence issues, it is essential that we choose "nice" 
versions of the filtered estimates. We cite the following result which provides 
what is needed. 

Lemma 2.3. For every initial measure fi G P{E), there is a probability 
kernel vr'^ : [0, c»[x$7 x B{E) -^ [0, 1] such that: 

1. For every A G B{E), the process {t,uj) i-^ 7r^(t,io,A) is the J^^ -optional 
projection of {t,uj) h-> /yi(Xj(a;)). 

2. For every lo €il., the P{E)-valued sample path t h-> 7r^(t, cv, ■) is cddldg in 
the topology of P{E). 

For simplicity, we denote by vrf(-) the random measure cj t-^ 7r^(t,u;, •). 

Proof. See [30], Proposition 1 or [17], Theorem A. 3. D 

As we will deal with different initial measures, the uniqueness of tt^ is of 
interest. The following result is straightforward due to the separability of E. 

Lemma 2.4. The kernel vr^ is unique up to P^\-pY -indistinguishability. 

Proof. As E is Polish, we can find a countable algebra {An} C B{E) 
such that B{E) = a{An :n G N}. Let tt^ and tt^ be two kernels that satisfy 
the definition of the previous lemma. To show that Tr^{t,uj,-) = 7r^{t,uj,-), it 
suffices to show that 'iT^{t,u;,An) = Tr^{t,LO,An) for all n. But by the unique- 
ness of the optional projection up to evanescence [24], Theorem IV. 5. 6, we 
can clearly find a set B G J^^ of P^-full measure such that this holds for all 
te [0,00 [ and u eB. D 
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3. Main results. The purpose of this section is to state our main results. 
We also give some short proofs; the remaining proofs appear in the following 
sections. 

3.1. Uniform observability and filter stability. Let us begin by introduc- 
ing the central result of this paper. We are interested in characterizing the 
stability of the filter, that is, the dependence of vrf on // as i — > cxd. Our 
general result relates this question to the following uniform notion of ob- 
servability. 

Definition 3.1. Let G C Cb{E) be uniformly bounded and equicon- 
tinuous. The filtering model is said to be G-uniformly observable if for 

{^in].{Vn]^P{E) 



n—foo n— >oo 



||P^"|jc-y -P^^ljri-llTV >0 implies W^in-i^nWc ^0. 

When G = hip(E) the model is simply called uniformly observable. 

In [29], a model is called observable if P^\^y =P'^\jrY implies ^ = iy. 
Evidently G-uniform observability implies observability whenever G is a 
separating class. However, uniform observability is strictly stronger than 
observability: the model is observable whenever the map ^ i— > Y*^\j:y is in- 
jective, while uniform observability requires in addition that the inverse map 
is uniformly continuous.^ 

Remark 3.2. In principle one could define uniform observability in to- 
tal variation by choosing G to be the unit ball in Ch{E) (our proofs then 
require some modification as this family is not equicontinuous) . However, 
uniform observability almost never holds in this setting, as is illustrated by 
the following toy example. 

For /i e P(R), denote by P^ G P(M) the law of F = X + ^ where X ~ 
^ and ^ ~ A^(0, 1) are independent. Let iJ-n = ^{i/n} ^-i^d ^ = 5^q^. Then 
||P^^ — P^IItv — > as n — > oo while ||/in — /^||tv = 2 for all n. Note that this 
entirely reasonable model is observable and even Lip (M) -uniformly observ- 
able, but uniform observability in total variation fails. Evidently we cannot 
obtain uniform observability in total variation when the observations are 
"smoothing," as is usually the case in practice, and it is therefore essential 
to use a smaller class G. 



^ We recall the following elementary facts. A map f :S ^ T between metric spaces 
{S,ds) and {T^dr) is called uniformly continuous if for every e > 0, the exists a 5 > 
(depending on e only) such that ds{x,y) < S implies dTif{x),f{y)) < e. Equivalently, / is 
uniformly continuous if and only if for every pair of sequences {xn)n>o and (j/n)n>o such 
that ds{xn,yn) — >0, we have dT{f{xn),f{yn)) — >0. The proof of this fact is standard and 
is therefore omitted. 
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The following result relates the notion of uniform observability to the 
stability of the filter. We will prove this theorem in Section 4. 

Theorem 3.3. Let G C Cb{E) be uniformly bounded and equicontinu- 
ous, and suppose that the filtering model is G-uniformly observable. Then 

I, t— >oo 

ht-'^tWc ^0, P^-a.s., whenever P^'Ijty -^P^lyrY. 

Note that in this result G need not be a separating class. However, we are 
typically interested in the case where G = Lip(£') . In the following subsec- 
tions, we will introduce various filtering models where uniform observability 
can be verified. 

Remark 3.4. The condition P^'ljri- < P^'ljc-y always holds when /i < z^, 
but the latter is not necessary. It could even be the case that P^\jrY ^ P'^Ijty 
for every /x,!/ S P{E)^ in which case the filter forgets any initial condition. 
The latter property is closely related to the notion of controllability; see [29], 
Section 7. 

3.2. Compact state space. We have seen that observability in the sense 
of [29] is a weaker condition than uniform observability. However, in the 
special case that E is compact and {X, Y) is Feller, observability and uniform 
observability are equivalent. This follows directly from the general fact that 
any continuous bijection from a compact metric space to a metric space is a 
uniform homeomorphism. The proof of this fact is elementary and is given 
here for completeness. 

Proposition 3.5. Suppose that E is compact and that {X,Y) is Feller. 
Then observability, that is, the requirement that P^\-pY = P'^Ijty implies 
/i = u, already guarantees that the filtering model is uniformly observable. 

Proof. Let {/in}>{i^n} C P{E) and suppose that ||/i„ — fn||BL A 0- 
Then we may assume, by passing to a subsequence if necessary, that ||/U„ — 
J^nlJBL > e > for all n. As E is compact, {/in} and {fn} are tight and we may 
assume, again passing to a subsequence if necessary, that ||/tn — m||bl — > 
and Wun — z^IIbl — > for some /U,i/ G P{E). By the Feller property, we find 
that 

n— >oo 

II" \jrY — f Ijfi-IIbl ^ ll-t^ 1.:^^ ~ -t^ Ij^^IIbl 

(see [15], Theorem 4.2.5). But by the observability assumption and \\fi — 
i^IIbl > ewe must have ||P'^|jpy -P'^Ijti'IIbl >0, so that ||P'^"|jry -P^'^ljc-y ||tv A 
0. By contradiction, [[P'^^lj^y — P'^'^lj^y ||tv ~^ must imply ||/Xn — i/^Hbl — > 
0. D 
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As a consequence, observability gives rise to stability of the filter in the 
sense of Theorem 3.3 when the signal state space is compact and the filtering 
model is Feller. Note that this result could also be obtained from the main 
result in [29] by using the Arzela-Ascoli theorem (as outlined in Appendix 
B). 



3.3. The Kalman-Bucy filter. Consider the hidden Markov model de- 
fined by the unique martingale problem solution to the stochastic differential 
equations 



Xt = Xo+ f AXsds + BWt, 
Jo 

Yt= [ CXsds + DVt, 
Jo 



where E = R'^,Ae R'^'''^, B G R'^'^p, C G M^^*^, D G M'i'^^ and Wt and Vt are 
independent p- and r-dimensional Wiener processes, respectively. We refer 
to this hidden Markov model as the linear Gaussian filtering model. When 
Xq is Gaussian and D is invertible the associated filtering problem is solved 
by the Kalman-Bucy filter; however, these assumptions are not required for 
our purposes. 

We begin by stating a variant of a well-known result from linear systems 
theory. 

Lemma 3.6. The following are equivalent. 
1. The dq x d-matrix 



0{A,C):-- 



2. There is a linear function f : {R'^) 



C 
CA 

CA'^- 

q\k _ 



has full rank. 



V^ such that 



<i: 



Ce^'xds,..., / Ce^'xds 





tfe 



for all X £ . 



for some finite number of times ti, . . . ,tk G M+ (A; G N). 
When this is the case, we say that the pair {A, C} is observable. 

Proof. Suppose that 0{A,C) has full rank. We begin by noting that 

lim- / Ce^''dr = C. 
t\o t Jo 
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Similarly, we find that 



lini lim — 

s\Oi\0 st 



s+t 



Ce^^'dr 



Ce'^'dr 



CA. 



Proceeding along the same lines, we can find for every e > a finite number 
of times ti(e), . . . , tfc(e) and a matrix Hi. G R'^'?^^'? such that 



H, 



f Ce^'ds 
Jo 



tk{e) 



Ce^' ds 



LJO 



OiA,C). 



But as 0{A, C) has full rank, the matrix on the left-hand side will have full 
rank for e sufficiently small, and the claim follows in one direction. 
To prove the converse, note that by the Cay ley-Hamilton theorem 

ft 

/ Ce'^'ds = co{t)C + ci{t)CA + ■■■ + Cd-i{t)CA'^-^ 
Jo 

for coefficients Cj(i) depending on t and A only. Therefore, by the existence 
of the function /, the matrix 0{A,C) has a left inverse and therefore has 
full rank. D 

We now obtain the following result. 

Proposition 3.7. The linear Gaussian filtering model is uniformly ob- 
servable if and only if {A, C} is observable in the sense of linear systems 
theory. 

Proof. We can solve the equation for (Xj,lf) explicitly: 

Xt = e^^Xo+ [ e^^^-'^BdWs, 
Jo 

Yt= f Ce^'Xods+ f r Ce^'^'-'^BdWrds + DVt. 
Jo Jo Jo 

Suppose first that {A, C} is not observable. Then there exists t; E M'^ such 
that 

Ce^'vds = Q foraUt>0. 

When this is the case, it is easily seen that for any initial law /i G P(M ), 
the initial law n* 6^ gives rise to the same law of the observations as does 
fi. Therefore the model is certainly not uniformly observable. 
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Conversely, suppose that the pair {A,C} is observable. Let ti,. . . ,tk and 
/ : (R'?)*^ -^ W^ be as in Lemma 3.6. Then we can write 

{Yt,,..., FiJ = ^*' Ce^'Xo ds,..., j'j Ce^'Xo ds^ + ^, 

where .^ is a fcg-dimensional Gaussian random variable. In particular, the 
characteristic function of ^ vanishes nowhere. By Proposition C.2 and the 
fact that / is Lipschitz continuous (as it is linear), it is easily established 
that 



n— >oo n— ►oo 



||pM"|^y _P^"|^y||^^ >0 implies Wfin-T^nhh ^0. 

This completes the proof of uniform observability. D 

As a corollary, it follows from Theorem 3.3 that if {A, C} is observable, 
then llvrf — vtj'IIbl — > P^-a.s. as t ^ oo whenever P^\jrY <^ F'^Ijty . This re- 
sult is essentially known, see, for example, [19], Section 2. However, previous 
proofs rely crucially on the fact that the solution to the filtering problem 
can be explicitly expressed in terms of the Kalman-Bucy filtering equations, 
which are amenable to explicit analysis. In contrast, the Kalman-Bucy filter 
(in the case of unstable signals) has hitherto been out of reach of results on 
filter stability which also apply to nonlinear filtering models. The present 
approach is therefore of significant interest, as it allows us to infer stability 
of the filter directly from the general Theorem 3.3. 

Remark 3.8. The present result differs somewhat from previous stabil- 
ity results for the Kalman-Bucy filter. It is customary to assume control- 
lability in addition to observability, which is replaced in our setting by the 
absolute continuity requirement P^\jrY <^ P'^Ijty . It is not difficult to verify 
that if the signal is controllable and D is invertible, then P^jjpy ~ F'^Ijty for 
every /i,i^ £ P(IR ), so that our result is in fact more general in this sense. 
On the other hand, the assumptions in [19], Section 2, are weaker than the 
observability assumption; in particular, detectability suffices (at least when 
D is invertible; see also [29], Appendix A). It would be interesting to obtain 
a generalization of the latter notion to general hidden Markov models, for 
example, by combining Theorem 3.3 with the results in [28]. 

3.4. Diffusion signals. The verification of uniform observability for the 
linear Gaussian filtering model was simplified significantly by the fact that 
the stochastic differential equations which define the model can be solved 
explicitly. In the present subsection we will verify uniform observability for 
a class of nonlinear filtering models, where we do not have this luxury. 
Consequently the conditions for uniform observability will be more stringent 
than in the previous section; in particular, we will recover Proposition 3.7 
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as a special case in the setting where C is invertible (in which case {A, C} 
is automatically observable). 

Let E = M3 (i.e., we assume that the signal and observation state space 
dimensions coincide) . We consider a hidden Markov model of the form 



Xt = Xo+ f b{Xs) ds+ [ a{X,) dWs 
Jo Jo 

Yt= [ h{X,)ds + DVu 
Jo 
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where Wt and Vj are independent p- and r-dimensional Wiener processes, 
respectively, and D G W'' , b:W^W,a:W^ W'p , h:W -^ W. In addi- 
tion, we assume that the following conditions hold: 

1. 6 is globally Lipschitz continuous; 

2. a is globally Lipschitz continuous and bounded; 

3. h{x) = Cx + hQ{x), where C is an invertible matrix and ||C~"^/io||l < 1- 

Note that under these conditions it is well known that the martingale prob- 
lem for {X, Y) has a unique solution, so that our model is well defined. 
The proof of the following result can be found in Section 5. 

Theorem 3.9. The filtering model in this section is uniformly observ- 
able. 

The required form of the observation function h may seem a little odd; 
however, the proof of Theorem 3.9 shows that this is a natural choice. To 
gain a little more insight into this condition, we prove the following lemma. 

Lemma 3.10. Any function h{x) = Cx + /io(x), where C is invertible 
and ||C~"^/io||l <^, is bi- Lipschitz, that is, there exist <m < M < oo such 
that 

m\\x — y\\ < \\h{x) — h{y)\\ < M\\x — y\\ for all x,y £ W^. 

Conversely, ifq = l and h is a bi-Lipschitz function, then h{x) = Cx + hQ^x) 
for some < C < oo and Lipschitz function Hq with ||C~"^/io||l < 1. 

Proof. Suppose that h{x) = Cx + hQ{x), where C is an invertible ma- 
trix and ||C~"'^/io||l < 1- Clearly M := ||/i||l < oo. Moreover, we can estimate 

\\x - y\\ < \\C^^h{x) - C^^h{y)\\ + \\C~^ho{x) - C~^ho 

<\\C~'\\\\h{x)-h{y)\\ + \\C-^ho\\L\\x-y\\. 

As ||C~^/io||l < 1, we may set m := (1 - ||C"^/io||l)/||C"^||. 
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Conversely, let q = 1 and suppose that h is bi-Lipschitz with constants 
m < M. Then in particular /i : M — > M is a continuous bijection, so that it is 
either strictly increasing or strictly decreasing. Define C := (Af + m)/2 if h 
is increasing and C := — (M + m)/2 if h is decreasing. Then for any x > y, 
we evidently have 

(1 -e){x-y)< C-'hix) - C-'hiy) <{l + e){x- y), 

where e := (M — m)/{M + m). In particular, 

\C-^h(x)-x-(C-^h(y)-y)\ , „ 

J ^— ■ ^ ^-^ — < £ for all x>y. 

\x-y\ 

This estimate consequently holds for all x,y G M by symmetry. The result 
now follows by noting that ho{x) := h{x) — Cx satisfies ||C~^/io||l < e < 1- 
D 

Using Lemma 3.10 we find that when the signal state space is the real line, 
the filtering model of the present section is uniformly observable whenever 
the observation function /i is a Lipschitz bijection with Lipschitz inverse 
(i.e., bi-Lipschitz). In higher dimensions the condition h{x) = Cx + Hq^x) 
is stronger than the bi-Lipschitz condition, and enforces the idea that h(x) 
cannot be "too nonlinear." 

Intuitively, one might well expect that for additive noise observation mod- 
els with a strongly invertible observation function h, the filter would be sta- 
ble under only mild conditions on the signal process. This is certainly the 
spirit of Theorem 3.9, but the requirement on h and the assumptions on 
the signal (i.e., that it is a diffusion) are somewhat stronger than one might 
expect to be necessary. Following the approach used in the proof of Theorem 
3.9, the author did not succeed in weakening the requirements of that result. 
For comparison, however, let us briefly discuss a related problem in discrete 
time where a very general result may be obtained. 

Let E = R'', and let P : M* x ^(M*) ^ [0, 1] be a given transition proba- 
bility kernel. On the sequence space E + x F + with the canonical coor- 
dinate projections Xn{x,y) = x{n), Yn{x,y) =y{n), we define the family of 
probability measures P'^, fi € P{M'^) such that {Xn)n>o is a Markov chain 
with initial measure Xq ~ /_f and transition probability P, and such that 
Yn = h{Xn) + Cn for every n > where ^„ is an i.i.d. sequence indepen- 
dent of {Xn)n>o- We now define for every /z G P(W^) the regular conditional 
probabilities 

<(•) := P'^(X„+i G -1^0, ■.■,Yn), n > 0. 

In other words, vr^ is the one step predictor of the signal given the observa- 
tions. 

In the present setting, the following result holds without further assump- 
tions. 
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Proposition 3.11. Suppose that the following hold: 

1. h possesses a uniformly continuous inverse; and 

2. the characteristic function of ^o vanishes nowhere. 

n~*oo 

Then IItt^-^IIbl ^0, P''-a.s. whenever P^'\,{^Y,),^o} <-'P"U{Y,h>o}- 

Proof. Let fi,u e P{W) satisfy P''U{(y,)fe>o} < Pla{(F,)fe>o}' and let 
^£ P{W) be the law of ^o- It is easily verified that for any pe P(R'^) 

p^(y„+ie-|yo,---,^n) = </i"'*e 

The classical result of Blackwell and Dubins [2], Section 2, shows that 

IK^~ *?-<^" *C||tv ^0, P^-a.s. 

We therefore obtain by Proposition C.2 

|K/i"^-7r>"^||BL ^0, P^-a.s. 

As the bounded-Lipschitz functions are uniformly dense in [/^(IR'') [12], 
Lemma 8, 



n— »oo 



Kif o h) - <(/ o h)\ > for aU / G [/^(IR'^), P^-a.s., 

where the P'^-exceptional set does not depend on /. But h has a uniformly 
continuous inverse, so any function in Ub{M'^) can be written as f o h for 
some / S C/fe(M'^). The result now follows from Corollary B.4. □ 

It should be noted, in particular, that this result places no conditions 
whatsoever on the signal process X„ except the Markov property. However, 
this result is a statement about the one step predictor and not about the 
filter. In continuous time, one can obtain filtered estimates at time t by 
taking the limit of predictive estimates over the time interval [t, t + 5] as 
5 \ 0. The chief difficulty in the proof of Theorem 3.9 is to show that the 
limits as (5 \ and t — > cxd can be interchanged. 

4. Proof of Theorem 3.3. In the following, we denote by F^ the family 

F^ = span{/i {Yt, -Yo)--- fk{Yt, - Yo) : 

/i, . . . , A e 5(]R'?),ii, . . . ,tfc G [0,oo[, A; G N} 

of ^^-measurable cylindrical random variables. Before we turn to the proof 
of Theorem 3.3, we introduce two elementary lemmas. 

Lemma 4.1. There is a countable H^ C F^ , supf^^fjY\\h\\ca <1 so that 



iP'^ljpy — P'^ljpy IItV = sup 

heHY 



hdP^- / hdY 



for all n,u£ P{E)- 
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Proof. Note that for any fi G P{E), the cr-fields J^^ and ^^ coincide 
P'^-a.s. By [15], Proposition 3.7.1, we have Q = Vfc>o^ where 

g^'*^ = a{y2-.,-yo:^ = i,...,4'=}. 

Choose a countable dense set {xp} C M*?, and consider the countable collec- 
tion of open balls 5p,„ = {x£ W:\x- Xp\ < 1/m}. Then g^''' = V„>o G^'''''' 
with 



^Y,k,n 



^{^2-fef -Yo^ Bp^^rrie ■ P£,mi = 1, . . . ,n,i = 1, . . . ,4''} . 



Now note that every ^^''^'"^ consists of a finite number of sets in Q , and 
for every A G Q^^^^^ the indicator function /^ G F^. But Q^'^'"^ y Q^'^ as 
n^>- oo and ^ ' /" ^ as A; — > oo, so that we can evidently estimate 

llTDitl TDl'l II IITDMI TD'^I II 

ll-r Ijpy — -r IjF^'IItV — II" Igy — " |gy||TV 

= 2 1im lim max \P''{A) -P''{A)\ 

fe— >00 n— »00 ylggy,*,n 



< sup 

h£HY 



hdP^ - / /idP'" 



where we have defined the countable family H C F as 

fc,neN 

On the other hand, the reverse inequality is immediate. D 
We will also need the following. 
Lemma 4.2. For any ^ £ F^ , /i G P{E) and t G [0, oo[, we have 

Proof. By the Markov additive property of our model, 
W{f{Y,+t - Yt)\Tt) = WUiYs^t - y)\Tt)\y=Y^ 

= TsSyf{Xt,Yt)\y=y^=TJ{Xt,0)=B'^t{f{Ys-Yo)). 
Along the same lines E'^(,^ o 0^\J^i) = E^f {^) for any ^ G F^ . Therefore 

E''(e o et\T^) = E^(E^-t (e)l-F^) =j'E'-{i)7:>l{dx) = E<(e) 

by the tower property of the conditional expectation, and the result follows. 
D 



\E<{o - E<(oi = '^^ j;;:yr ' ^'-■'- 
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We now turn to the proof of Theorem 3.3. We assume throughout the 
proof that F^\-fY <^ P'^Ijty , so that in particular both vrf and tt^ are defined 
uniquely P'^-a.s. 

Proof of Theorem 3.3. Let ^ G H^; then we have 

\BmA- B-{A\Trm oet\Tr 

by Lemma 4.2 and the Bayes formula, where A := dP^\jrY /dF'^\jrY . But H^ 
is countable and sup^gji^y ll^lloo < l, so we can evidently estimate 

sup |E<(e) - E<(oi < Z!:ff!TY^ f°^ ^11 * ^ ^+' p"-^-^-' 

where Mj := |A - E''(A|J"7')|. It should be noted that as P'^ljri' <C P'^l^y, 
all the preceding quantities are P'^ljp-y-a.s. uniquely defined and we have 
implicitly taken only countable intersections of sets of full measure (as H 
and Q+ are countable). 

We now claim that E'^(Mt|^^) -^ P'^-a.s. as t ^ oo along the rationals 
t e Q+. To see this, define M/= := |A/A<fc - E^(A/A<fc|J^i^)|, and note that 

Mt < M^ + A/A>fc + E''(A/A>fc|-^r) for all t G Q+, A; G N, P'^-a.s. 
Therefore we obtain, using that trivially 2A/A>fc — > as /c — > oo P'^-a.s., 
limsup E*" (Mt I J^t^)<lim sup limsup E''{M^\rJ), P^'-a.s. 

But as by construction P'^-a.s. M/^ < k for all t G Q+, A; G N and as P'^-a.s. 
limsupt^o^ tGQ4. ^t — fo'^ all A; G N by martingale convergence, we have 

limsup E''(Mf|jrf)< limsup limsup E^f sup M^\J^j\ =0, P'^-a.s. 
As by construction A > P'^-a.s., we have evidently established that 

u 1/ t — >oo 

sup |E^* (0 - E"* [i)\ >0, P^-a.s. 

Denote by [7o C a set of P^-full measure on which this convergence holds. 
Then 

l|P lj^y-P Ij^^IItv ^u 

for every w G r^o and every subsequence {tfe} C Q+ such that tk /^ oo. As 
the model is presumed to be G-uniformly observable, this implies that 

|K^(ife,cJ,-)-7r''(tfc,cJ,-)||G ^0 
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for every u; G JIq and every subsequence {tfc} C Q+ such that t^ y oo. 
But as G is uniformly bounded and equicontinuous and as TT'^{t,u!,-) and 
Tr'^{t,uj, •) are cadlag in the topology of P{E) by Lemma 2.3, it follows from 
[23], Theorem II. 6. 8, that ti— > ||7rf — vr^Hc is cadlag. We therefore obtain 

t— >oo 

||7r^(t,u;,-)-vr''(t,u;,-)||G *0 for ah uj^^q. 

This completes the proof. D 

5. Proof of Theorem 3.9. In the proof of Theorem 3.9 we will make essen- 
tial use of the flow generated by the deterministic part of the signal process: 
define r}t{x), for every x S R'^, as the solution of the ordinary differential 
equation 

7]t{x) =x+ b{r]s{x))ds. 



/o 

Existence and uniqueness follows from the global Lipschitz property of b. 
The special form of h is essential, as it allows us to establish the following. 

Lemma 5.1. Let h{x) = Cx + hQ{x), where C is an invertible matrix and 
||C'~"'^^o||l < 1- Then there exist constants eo > and m,M > such that 



m\\x — y\\ < 

for every e < Eq and x,y £ 

Proof. Let us define 



1 /■= 1 /"^ 

-/ h{7]s{x))ds / h(r]s{y))ds 

£ Jo £ Jo 



<M||x-y|| 



1 /"^ 

He{x) := - / h{r]s{x))ds 
£ Jo 

ite 

C~^H,{x) = - (\s{x)ds + - rC-^ho{r]six))ds. 
£ Jo £ Jo 



and note that we can write 



We now estimate as follows. 
\\x - y\\ < \\C~'H,{x) - C-'HMW + \\C-'H,ix) - C-^H,{y) -{x-y) 

< \\C~^H,ix) - C-'H,{y)\\ + - r ||7?,(x) - ijsiy) -{x- y)\\ ds 

£ Jo 

+ - r \\C-'hoi7]s{x)) - C-^ho{7]siy))\\ ds 
£ Jo 

<\\C-'\\\\Hs{x)-HM\\ + - r r\\b{iirix))-b{7jr{ymdrds 

£ Jo Jo 
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1 f^ 

+ ||C~^/io||l- / \\r]s{x)-ris{y)\\ds 
e Jo 

< \\C~^\\\\He{x) - He 



+ (I|C~'/io||l + ||6||Le/2)sup ||r/,(x) - r?, 



But note that 



\\ris{x) - ris{y)\\ < ||x-y|| + ||6||l / hrix) -r]r{y)\\dr, 

Jo 

so that by Gronwall's lemma 

sup||7?s(a;) -r/s(y)|| < ell^ll^^||x - y||. 

s<e 

We therefore find that for all x,y gM.'^ and e > 

1 - ||C-i/io||Lell*ll^^ - ||6||Leell^lli72 



x-y\\< \\He{x) -He 



\\c-H 

But evidently 

1 - IIC-i/iollLe"''"^" - ||6||Leell^ll^72 ^ 1 - ||C-1/io||l 



ic-i|| ^ lie- 



-11 



> as e \ 0. 



This establishes the lower bound. For the upper bound, note that 

\\He{x) - Heiy)\\ <- r Whiii.ix)) - hirjsiym ds 
£ Jo 

< ||/i||lsup \\r]s{x) - ris{y)\\ < ||/i||Lell^ll^^1|x - y||. 

s<e 

The proof is complete. D 

The following lemma is used to reduce the proof of Theorem 3.9 to the 
study of the deterministic part rjt{x), rather than working with the fully 
stochastic signal Xf. It is here that the boundedness of the diffusion coeffi- 
cient a is used. 

Lemma 5.2. Provided that a is bounded, we have 

sup sup E^{\\Xs-Vs{Xo)\\)^^0. 
s<t iieP{Ri) 

Proof. For every x G M"^, let ^t(x) be the solution of 

6(x)=x+ / b{Ux))ds+ [ a{Ux))dWs. 
Jo Jo 
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By the global Lipschitz property of the coefficients, the solution is uniquely 
defined and is square integrable for every x G M"?. We therefore obtain using 
Ito's rule 



E(||6(^)-r/t(x)f 
= E 



{2(6(x) - i]s{x),b{Ux)) - Hvsix))) + a{Ux))}ds 



where a{x) =Tr[a{x)*a{x)]. Note that as we have assumed that a is uni- 
formly bounded, a{x) is also uniformly bounded o(x) < -fC < oo. Therefore 

mCtix) - Vt{x)f) <Kt + 2\\h\\L fnWUx) - Vs{xW) ds. 

Jo 

By Gronwall's lemma, we obtain for every T < oo and x G M'' 

supE(||ef(x) - i]t{x)f) < KTe'^W^W^^ . 

t<T 

By Jensen's inequality, we find that for every T < oo 

sup sup E{\\Ct{x) - rjt{x)\\) < eH^II^^\/Kr. 

t<Tx£Ri 
It remains to note that 

sup E^(||Xi-r/t(Xo)||)= sup f B^'^iWXt - r,t{Xo)\\)Kdx) 

= supB'^{\\Xt-Vt{Xo)\\) 

x£Ri 

= supBiUt{x)-Vtix)\\). 

The proof is complete. D 

We can now proceed with the proof of Theorem 3.9. 

Proof of Theorem 3.9. Let us fix two sequences {/J-n}, {^n} C P(M^) 
so that ||P'^"|jpy — P'^^l^pyllTV ^ 0, a constant a> 0, and a function / G 
Lip(]R'^). In the following eo,m,M > are as defined in Lemma 5.1, and we 
define 

Hs{x):=- r h{iis{x))ds, He:=- f h{Xs)ds. 

£ Jo £ Jo 

By Lemma 5.2, we may choose e < eo such that 

sup sup E^(||X,-7?,(Xo)||)<a. 

s<e ^gP(IR9) 



22 



R. VAN HANDEL 



By Lemma 5.1, there is an unbounded-Lipschitz function g^, with ||ge||L ^ 
m^^, such that g£{H^{x)) = x for all x £ W^. In particular, we have for all 



/ / d/^in - fdu„ 



|E'^"(/,(i7,(Xo)))-E-"(/,(i7,(Xo)))|, 



where we have written fe '■= f ° ge- Now note that 

sup \B^{MH,{Xo)))-B^{UH,))\ 
fj.eP{Ri) 

<||/.||l||/i||l- r sup Bf^illXs-VsiXorndsKWhUm-'a. 
Therefore, we have for all n G N 



fd^n-f dvn 
To proceed, note that 



< 2||/.||tm-lQ + |E"-(/,(d,)) - F."{UH,))y 



£ e 



As DVe/e is Gaussian, its characteristic function vanishes nowhere. There- 
fore, using that /e G ^^^(M'') and Proposition C.l, we may choose Us G Ub{W') 
so that 



sup |E'^(/,(^,)) - W{u,{Ye/e))\ < a. 

/iGP(K9) 



We thus obtain for every n G N 



/ / dfin - f di'n 



< 2a(l + \\h\\Lm~') + |E'^"(n,(y,/e)) - B'^-{u,{Y,/e))\. 



But as llP'^^ljc-y — P'^^ljc-y IItv -^ 0, we evidently have 



limsup 

n— ►oo 



fdfln-f dl^n 



<2a{l + \\h\\Lm-^). 



Now note that a > and / G Lip(M'^) were arbitrary, so evidently 



/ di-in - fdv, 



for all/ G Lip (M«). 



The result follows from Corollary B.4. D 
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APPENDIX A: MEASURABILITY OF PROBABILITY DISTANCES 

The goal of this appendix is to prove that the distance ||/i — i/||g between 
two probability kernels fi, v is measurable, provided that the family G C 
Cb{S) is chosen appropriately. To this end we prove the following lemma. 



Lemma A.l. Let Gc Gh{S) be uniformly bounded and equicontinuous. 
Then there is a countable collection {gn : n G N} C G such that 



\I^-'^\\g- 



■ sup 

n 



gndix- / Qn dv 



for all /x,i/ G P{S)- 



Proof. P{S) is Polish, so there is a countable dense subset {/U„:?i G 
N} C P{S). As any probability measure on a Polish space is tight, we can 
find for every n,7n,p G N a compact set Kn,m,p C S such that HniKn,m,p) > 
1 - l/p and fim{Kn,m,p) > 1 - l/p- Let us write Gn,m,p = {f\K„,m.,p ■ f ^ G} C 
Gb{Kn,m,p)- By the Arzela-Ascoli theorem the family Gn,m,p is compact, 
and thus a forteriori separable, in the topology of uniform convergence. 
Therefore, we can find for every n,?TT,,p G N a countable family {(7^'™'^ : k G 
N} C G such that 

Vn, m,p G N, g G G, e > 0, 3/c G N s.t. 



sup 



\9{x) 



9k '^ix)\<e. 



We claim that the countable family G' 



\l^- t^Wc 



- sup 
gee 



gdjd- gdv 



^^n,m,p • ,2, m,,p, A; G N} C G satisfies 
l/U — z^IIg' for all /i,z/ G P(S'). 



Of course, the inequality ||/i — i^||g" < ||/i — z^||g is trivial as G' C G, so it 
suffices to prove that for every fijU £ P{S) there is a sequence {hi : ^ G N} C 
G' such that luihi) — v{hi)\ — > \\fi — i^Wg- To this end, let us fix ;U,i/ G P{S), 
and choose a sequence {h'^ :^ G N} C G such that |/^(/i^) — iy{h'^)\ — > ||/i — z^||g. 
Note that 

\\M) - u{h',)\ - Hhe) - u{he)\\ < \M) - ^(Ml + WiK) - i^{he)\ 

by the reverse triangle inequality, so it suffices to find a sequence {hi:i£ 
N} C G' such that \n{h'^) - n{hi)\ -^ and |z^(/i;) - z/(/i^)| -^ 0. Fix ^ G N. By 
[23], Theorem II. 6. 8, we can choose n,?7i G N such that ||/i„ — fi\\G < l/i and 
WfJ-m — i^Wg < 1/^- Choose A; G N such that sup2,g^^^^ {h'lix) — g^'^' ix)\ < 
1/i, and set hi = g^'^' ■ Then we can estimate as follows: 

ll^ih'e) - Kh£)\ < \M) - fJ'n{h'i)\ + |^„(/i^) - f^n{he)\ 



< 



\lJ-r 



I-'-\\g + 



K„ 



{h'l — hi) d^r 



+ 



K- 



(/l^ - hi) d^lr. 



n,m,£ 
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< ||/i„-/i||G+ sup \h'^{x) - hi{x)\ 

+ 2su^\\g\\^^Xn{K^^m/) 
geG 



2/ 

- 7(^1+ sup Iblloo 



where K"^ denotes the complement of a set K. The identical bound is found 
for \iy{h'f) — i'{h£)\. Repeating the procedure for every ^ G N, we evidently 
construct a sequence {h^} with the desired properties. This completes the 
proof. D 

This result will be used in the following fashion. 

Corollary A. 2. Let (il,.F) be a measurable space and let fi:0,x B{S) — > 
[0, 1] and v.Qx B{S) — > [0, 1] be probability kernels. Moreover, let G C Cb{S) 
be uniformly bounded and equicontinuous. Then ||^ — i/Hc is a random vari- 
able [i.e., the map oj 1— > ||/i(a;, •) — v{ui, ■)\\g is measurable]. 

Proof. Immediate from the previous lemma. D 

Corollary A. 2 is used implicitly throughout the paper without further 
comment. 

APPENDIX B: MERGING OF PROBABILITY MEASURES 

It is well known that a sequence of probability measures {^n} C P{S) 
converges weakly to ;U G P[S) if and only if ||/Un — ^||bl — > [13], Theorem 
11.3.3. In particular, as the class of bounded-Lipschitz functions is uniformly 
dense in Ub{S), it follows that if Unif) — > fJ-if) for all / G Lip(5) then ||/i„ — 
A*||bl -^ 0. This is in some sense surprising; evidently the convergence of the 
expectation of every function / G Lip(S') separately already implies that this 
convergence holds uniformly over Lip(5'), without any further assumptions. 

The purpose of this appendix is to show that a similar statement holds 
for the merging of two sequences of probability measures. This result was 
already proved in [22] and in [7], Section 6, for probability measures on any 
Polish space. We provide here an alternative and much simpler proof, which 
is however restricted to probability measures on M . In this paper only the 
latter will be needed. 

Proposition B.l. Let {fJ.n},{i^n} C P{R'^) satisfy 



/ / dfin - fdu„ 



for all f G Lip(R° 
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Moreover, let G C U^i^ ) he a uniformly bounded and uniformly equicontin- 

n—foo 

uous family of functions. Then ||/Un — t^nllc ^ 0. 



The proof is based on the following well-known result from Banach space 
theory, which states in essence that the claim is true for probability measures 
on N (rather than R ). An elementary proof can be found in [1], Theorem 
4.32. 

Lemma B.2 (Schur property of £i). A sequence in ii converges in the 
weak topology if and only if it converges in the norm topology. 

Remark B.3. Note that this result only holds for sequences. Indeed, it 
can not hold for nets, as that would imply that the weak and norm topologies 
coincide. 



We now turn to the proof of Proposition B.l. The basic idea is to reduce 
to the setting of Lemma B.2 by introducing a partition of unity. 

Proof of Proposition B.l. For every a > 0, define the countable 
family of functions V" = {x i-^ ip^-^ (ax^) ■ ■ ■ ^pk^{ax'^) : (/ci, . . . , kd) G lA} C 



Ub{W),w\iem^pk{x) 
ily verified: 



:cos2(7r(x-fc)/2)/| 



x-k\<l 



The following facts are eas- 



1. Q<ip{x)<l for ah ip^V^] 

2. For every x G M, at most A^ elements of (^ G V^ satisfy (^(x) > (where 
A^ G N depends only on the state space dimension d); 

3. For every x G M'^, we have X^weV" vi^) = 1; 

4. sup^g^Q II<^I|l < oo (i.e., y" is equilipschitzian) . 

Thus y" is a partition of unity of M with some additional uniformity prop- 
erties (which will be important in the following). 

Fix e > 0. As G is uniformly equicontinuous, there exists a 5 > such that 
ll^; — 2/11 < 5 implies |5'(x) — g{y)\ < e for all g £ G. Choose a large enough so 
that every element of V°' is supported inside a ball of radius 6. Moreover, 
choose for every ip G V^ an arbitrary point x^ G W^ in the support of (p. 
Then 



IfJ-n — l^nWc 



sup 



E 



gipdj^r, 






< sup ^ / gip dfin - j g'P dvn 
^^P E 1 i9-9ix^))^dfin 



B^G^yo 



+ 



{g- g{x^))pdvn 
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+ sup Y^ I \gix^)\ ipdfiri 

<2e + sup||5'||oo X! fdfin- 
Suppose that we can show that for any a > 



tp dfln — ipdv., 



(fdUn 
ifdl'n 



E 

Then the result follows as e > was arbitrary. 

Now note that V" is countable, so it may be ordered as V°' = {xk : A; G N}. 
For any finite signed measure p, the sequence {p{xk))keN S ii- We must es- 
tablish that (pnixk) — i^niXk))k£N converges to zero in the ^i-norm. There- 
fore, by Lemma B.2, it suffices to prove that this convergence holds in the 
weak topology. In particular, define for every z G £qo the function /^ := 
J2k^kXk- Then it suffices to show that 



fz dpn - fz dVn 



■ for every z G 



By our assumptions this is the case if H/^Hoo < oo and \\fz\\L < co. But 
this holds for any z G £00! indeed, it is easily seen from the properties of V^" 
that ||/i;||oo < Iklloo and \\fz\\L ^ 2A''||z||ooSup^gyQ II<^I|l- This completes the 
proof. D 

The following corollary is immediate. 

Corollary B.4. For {pn}, {j^n} C P(R'^), the following are equivalent: 

1. I J f dpn — J f dun\ —>^0 as n—>- 00 for every f G Lip(]R''); 

2. \\pn — I'nllBL ^0 aS n ^ CXD. 

It should be noted that for sequences of probability measures in P{K), 
where K is a compact Polish space, the same results can be proved in a 
completely elementary fashion; indeed, in this case any uniformly bounded 
and equicontinuous family G C Ch{K) is compact in the topology of uniform 
convergence (by the Arzela-Ascoli theorem), so that it can be covered by 
a finite number of arbitrarily small balls. The previous results then follow 
from elementary arguments. When the state space is not compact, however, 
the result is far from obvious and relies heavily on the (elementary but 
nontrivial) Schur property of ii. 
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APPENDIX C: UNIFORM APPROXIMATION AND CONVOLUTION 

In order to verify uniform observability for additive noise observation 
models, the following result is of central importance. Though the result 
seems to be of independent interest — it states that the range of a convolution 
operator on [/^(M ) is uniformly dense in [/^(R ) under a mild condition — 
the author was not able to find a statement or proof of this result in the 
literature. 

Proposition C.l. Suppose that the characteristic function of the prob- 
ability measure fi G P(R'^) vanishes nowhere. Then the family {f * fJ-'- f ^ 
Ub{^'^)} C Ub{M.'^) is dense in Ub{^'^) in the topology of uniform conver- 
gence. 

The difficulty here is that we seek uniform density of functions on a 
noncompact space; as the Banach space dual f/fe(M )* contains elements 
which are not countably additive, this precludes the routine application of 
the Hahn-Banach theorem (see [14] for related results). We circumvent this 
problem by using the elementary properties of convolutions to "push" the 
approximation problem into L^(M'^), where standard approximation results 
are readily available. 

Proof of Proposition C.l. We first collect some well-known facts 
about convolutions. Let (p G L-^(M'^) be any function such that J ip{x)dx = 
1. Define 93*(x) = t~'^ip(t~^x). Then by [16], Theorem 8.14, we have ||/ * 
V'* - /lloo ^ as t ^ for any / G ^/^(M'^). Moreover, for / E Ub{M.'^) and 
g£ L^{R'^), we have by [16], Proposition 8.8, that ||/*^||oo < ||/||oo||^||i and 
f*Qe Ub{M.'^). Finally, if ^ e L^{R'^) and v E P(R'^), then Q*ue L^M.'^) by 
[16], Proposition 8.49. 

Fix If e L^(M'^) as above and let / G f/b(M'^) and /c G N. Then we may 
choose t>0 such that ||/ * (/?* — /||oo ^ k~^. Now suppose that we can find a 
function g^ G L^(M"') such that ||(/J* — Qk* l^-Wi ^ k~^. Then we can evidently 
estimate 

11/ - / * (^fc * /U)||oo < 11/ - / * V^loo + ll/lloollv'* - a * ^lll < fc"'(l + ll/lloo). 

But note that / * {gk * fJ-) = {f * Qk) * ^ and gk ■= f * gk ^ UbiR'^). Repeating 
the procedure for every k G N, we find a sequence {gk} C Ub{W^) such that 
11/ — 9k * ^lloo — *■ 0. As the function / G [/^(R'^) was arbitrary, the result 
follows. 

It thus remains to show that for every i > and A; G N, we can find a 
function g G L^(]R ) such that ||(^* — Q * fJ-Wi ^ k~^. It suffices to show that 
the family {g* fi:g£ ^^(M'^)} C L^(M'^) is dense in L^(M'^). To this end, 
consider 

r$ := span{x ^^{x-a):a£ R'^} C L^(M'^), $(x) := e'll'^ll'/^ 



28 



R. VAN HANDEL 



Evidently {g* fi: g & T^} is the span of all translates of the function $ * /i. 
But the Fourier transform (^ * /i)^ = <l>^/i^ vanishes nowhere, so {g* fx: g€ 
T<J>} is dense in L^(M"') by the Wiener Tauberian theorem [25], Theorem 
9.5. D 

As an application, we prove the following result. 

Proposition C.2. Let {fin},{'^n} C P{R'^), and let C e P(M'^) be a 
probability measure whose characteristic function vanishes nowhere. Then 



|/Un*C-Z^n*?|| 



BL 



-^ if and only if ||^.„ — fi 



n BL 



0. 



In other words, if ,^ is a probability measure whose characteristic function 
vanishes nowhere, then the convolution operator C^ : P{M.'^) — > P{M.'^) defined 
as C^fj, = /i * C is uniformly continuous, injective and the inverse operator 
C^^ : Ran Cg -^ P{M. ) is uniformly continuous (relative to the || • ||BL-norm). 

Proof of Proposition C.2. Denote by f the reflected probability 
measure defined by 



fix^dx) 



f{-x)i{dx) for ah / G B{W). 



Clearly the characteristic function of ^ vanishes nowhere. Now note that we 
obtain for any probability measure /i G P{M. ) the identity 



f{x){^l*i){dx)= if*^){x)fiidx 



for all feB{ 



Moreover, it is easily verified that / * .^ G Uh{M. ) whenever / G [/;,( 
Let us first suppose that ||^„ — fn||BL — > 0. Then 



/ dfin - fdvn 



for all / G 



') 



as the family of bounded-Lipschitz functions is dense in 
ogy of uniform convergence [12], Lemma 8. Therefore 



in the topol- 



fd{pn*0- / fd{Vn*C) 



if*Od^ln- {f*i)dv, 



for every / G Ui){W^). That ||^n * C ~ ^n * CIIbl ^ follows from Corollary 
B.4. 

Conversely, let us suppose that ||/U„ * ^ — i/^ * ^||bl — > 0, so that 

n— >oo 



/ d/U„ - fdUn 



0, 



whenever / G {5 * ^ : 5 G Ub{R'^)}. 



By Proposition C.l, the family {g * ^:g G L^f,(M )} is uniformly dense in 
Ui){W^)] therefore this convergence holds for any / G [/^(M'^). But then Corol- 
lary B.4 implies that ||/i„ — z/„||bl — > 0, and the proof is complete. D 
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